Statistical inference using weights and survey design

2. Inference in theory and practice

Pierre Walthéry

UK Data Service

October 2025

This session

  1. Design-based inference from social surveys
  2. Real world vs ideal world inference
  3. What to take into account
  4. Scenarios

1. Design-based inference

Inference

  • Consists in estimating parameters of interest in the population based on the sample (i.e. survey microdata)

    • Point estimates such as means or medians
    • Their degree of precision: confidence intervals or standard errors.
  • We have seen that both estimates are affected by the sample design:

    • by increasing the sampling fraction for some groups, stratification improves the precision of estimates,
    • whereas clustering negatively impacts precision, by in effect reducing the diversity of the sample.
  • As most surveys rely on both, assessing the impact of survey design on the precision of a given estimate is not straightforward, and depends on the quantity and subpopulation of interest.

  • Introductory courses tend to leave out this aspect, which may give a false impression of simplicity to users.

Model vs design-based estimation

  • There are traditionally two main ways to produce population estimates from surveys while accounting for the survey design:

    • either by directly using methods that correct estimates for the characteristics of the sample - also known as a design-based estimation
    • or by modelling the effect of the survey design - the model-based approach, which rely on techniques such as multilevel modelling.
  • We will only focus on the design-based approach here as it tends to be more straightforward for estimating common population parameters such as means, medians, proportions, and totals.

Survey weights (1)

  • Even with data producers’ best effort, no real world social survey enables unbiased estimation out of the box. Adjustments are necessary in order to reflect under/over represented population characteristics

    • Survey weights are a special type of variable in survey datasets, whose value reflects the relative ‘importance’ of observations in the sample.
    • Prevent estimates from being biased by compensating for over/under representation of certain types of respondents.
    • Usually higher for observations less likely to be part of the sample (ie for young men in urban areas), lower for those more likely to be part of the sample (ie women over 50).

Example: values of LFS weights

library(haven)
library(dplyr)
lfs<-read_dta("~/Data/lfs/UKDA-8999-stata/lfsp_aj22_eul_pwt22.dta")
summary(lfs$PWT22)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   496.0   756.0   877.7  1103.0 12355.0 
lfs |> filter(!(AGE<20 | (AGE>30 &AGE<50)))|> group_by(as_factor(SEX),as_factor(AGES)) |> summarize(mean(PWT22))
# A tibble: 16 × 3
# Groups:   as_factor(SEX) [2]
   `as_factor(SEX)` `as_factor(AGES)` `mean(PWT22)`
   <fct>            <fct>                     <dbl>
 1 Male             20-24yrs                  1520.
 2 Male             25-29yrs                  1411.
 3 Male             30-34yrs                  1185.
 4 Male             50-54yrs                   864.
 5 Male             55-59yrs                   756.
 6 Male             60-64yrs                   654.
 7 Male             65-69yrs                   532.
 8 Male             70 and over                725.
 9 Female           20-24yrs                  1437.
10 Female           25-29yrs                  1230.
11 Female           30-34yrs                   997.
12 Female           50-54yrs                   806.
13 Female           55-59yrs                   707.
14 Female           60-64yrs                   608.
15 Female           65-69yrs                   510.
16 Female           70 and over                873.

Region

            as_factor.URESMC. mean.PWT22.
1                 Tyne & Wear    779.1205
2     Rest of Northern region    649.1621
3             South Yorkshire    797.3118
4              West Yorkshire    822.3008
5  Rest of Yorks & Humberside    722.5770
6               East Midlands    782.4144
7                 East Anglia    745.5730
8                Inner London   1789.8896
9                Outer London   1137.6921
10         Rest of South East    862.2485
11                 South West    727.4238
12 West Midlands (met county)   1096.7354
13      Rest of West Midlands    795.7293
14         Greater Manchester    974.5903
15                 Merseyside   1287.3725
16         Rest of North West    826.2745
17                      Wales    802.5099
18                Strathclyde    909.0254
19           Rest of Scotland    847.6807
20           Northern Ireland    262.0993

Survey weights (2)

  • They are usually made of three components in cross sectional surveys:

    • a design component that accounts for issues of unequal probability of selection of sample members resulting from survey design
    • a non-response component, correcting for (known) lower propensity to take part to surveys among certain categories of respondents
    • A calibrating or benchmarking component that ensures that weighted demographic variables, such as age, sex and geography, match the current ONS population estimates.
  • These are sometimes called ‘weights’ individually (eg ‘Design weights’), but in practice they are often merged into a single variable.

Other kinds of weights

  • In longitudinal surveys there is also a fourth layer of weights, that account for attrition (ie respondents dropping out). They usually consist in the inverse probability of dropping off at given waves of data.
  • Most of the time the will account for attrition between two or more waves
  • Others components may sometimes be included, such as for benefit units components; whether proxy interviews should be included, diary weights in time-use surveys, etc..

Weights included in Understanding Society Wave M

A screenshot displaying all weights included in Wave m, individual questionnaire dataset of Understanding Society

Survey weights (3)

  • Survey weights may also be rescaled in order to inflate sample counts to population totals thus becoming grossing weights.

  • In that sense, the numerical values of the weights are an indication of the number of units these observations ‘represent’ in the population.

  • Computation of weights rely on calibration algorithms:

    • Optimise the conditional distribution of observations across the weighting variables

    • (Think of a 3+ variable crosstab (e.g. age, gender, economic activity, region) )…

    • … While minimising the standard errors (which depends on the number of observation in each cell of the table)…

    • … And maximising representativeness (making sure that all cells have observations).

  • Using survey weights also affects the precision of estimations - often negatively - and this should be taken into account.

Survey design variables

  • Usually consists of identifiers used during the sampling process:

    • Strata
    • Clusters especially Primary Sampling Units (PSU), that is clusters that were drawn during the first stage of sampling.
  • Used in conjunction with weights, enable users to improve the accuracy of estimates (with the help of dedicated survey estimation functions in statistical packages) from those solely relying on survey weights.

  • Unfortunately, whereas virtually all surveys curated by the UK Data Service include weights, survey design variables are not always provided by data producers due to concerns about anonymity proctection

Design effects and design factors

\(D_{eff}\) and \(D_{eft}\) are two versions of a coefficient meant to quantify how much estimate standard errors from a complex sample differ from what it would have been under simple random sampling (Kish 1995).

  • \(D_{eft}\) is the ratio of the variance of an estimated parameter given the sample design to the same variance computed under the assumption of SRS. The \(D_{eft}\) is the square root of the Design Effect.

  • \(D_{eft}<1\) denotes a smaller variance than under SRS, that is an improvement in precision, whereas \(D_{eft}>1\) denotes a loss of precision.

  • Can potentially be used to manually correct standard errors and confidence intervals computed under the assumption of SRS.

  • Data producers sometimes publish \(D_{eft}\) to enable users to adjust standard errors as an alternative to including survey design information.

2. Real world inference

Ideal vs real world inference

  • In an ideal world, high quality inference relying on weights and survey design variables for maximum precision should be the default option.

  • In practice this is not always possible… Or even an absolute necessity:

    • The necessary variables are not necessarily present
    • The level of sensitivity of the analysis for may not require the highest degree of precision
    • Complexity of the analysis may not require the highest degree of precision
    • Software issues
    • What are we in fact estimating?

Availability of variables

  • UKDS-curated datasets are usually available under one of three main flavours: open, safeguarded (EUL), controlled:

    • Most data producers include weights…

    • A significant number of safeguarded datasets do not include survey design variables ie PSU and strata

    • In some cases, the controlled version of the study do

    • Users are therefore left with the option of either:

      • Using estimations procedures that assume simple random sampling and be explicit about the likely bias of their estimates…
      • Apply for access to the controlled version of the data if there is one
      • Or adjust standard errors when design factors are published by the data producer, whenever possible.

How precise should my estimates be?

  • It is often assumed that by default, the highest degree of precision should be sought in any analysis.

  • Actual real-world circumstances of statistical analysis differ widely. At two extremes:

    • Life and death studies, for instance when the relative costs and benefits of a medicine needs to be assessed
    • Some background statistics on the general characteristics of a population for an undergraduate essay or a press release
  • Clearly, spending a significant amount of time to obtain the most precise estimates makes more sense in the former rather than the latter case.

    • Precision comes at a cost, mostly but not only in terms of time.

Subpopulation and sample size

  • Given some survey data, most statistical software will compute near identical weighted point estimates irrespective of whether the proper sample design was accounted for.

  • What will differ will be the variability and therefore the precision of the estimate ie confidence intervals and/or standard errors.

  • The effect of not accounting for the sample design is likely to be larger, the smaller the number of observations:

    • Small sample of a population
    • Also small groups of a larger sample
  • Therefore (lack of) precision will matter more, the more specific the analysis.

The software side of things

#tl;dr: ‘weights’ do not always mean ‘survey weights’

  • The notion of weight in statistics extends beyond what we have seen so far in the context of survey estimation.

  • As a result ‘weighting’ functions used by statistical software do not always or sometimes mainly treat weights as survey weights

  • Two usages of weight in most statistical software:

    • Command-specific weights are specified in the context of individual commands and often are not meant to be used with survey weights
    • Survey weights: ie weights used in the context of estimation specific to survey design:
  • Use command-specific weighting at your own risks: some software are known to produce incorrect results: SPSS and SAS, and sometimes Stata will produce wildly incorrect variance estimates when used with grossing weights.

A plea for focusing on confidence intervals

  • Let’s take one step back from point estimates

  • What are we in fact estimating?

    • A single value which will almost certainly be wrong but may feel reassuring/easier to sell to an audience or..

    • An interval within which population values are constantly oscillating but more likely to be accurate

    • … And more importantly:

      • Reflect the inherent uncertainty of statistical estimation
      • And that of the world we are studying

3. Estimation scenarios

Four options in practice

  • Estimation accounting for survey weights and survey design variables using survey-specific commands
  • Estimation accounting for survey weights only using survey-specific commands
  • Estimation using weighted standard commands
  • (Unweighted estimation)

When survey design variables are available

  1. Find out about the survey design and identify the relevant weights and survey design variables using the data documentation
  2. Declare the survey design using software-specific commands
  3. Produce the estimates of interest, using survey design specific estimation commands available
  4. Document the confidence interval for the estimate of interest or alternatively the point estimates and its standard error.
  5. If required, provide a brief discussion of the possible source of bias of the results (specifically under/over estimation of the uncertainty of the estimates)

SD variables not available in the EUL version, but are in restricted dataset

  • Do a cost - benefit analysis of applying for a version of the data that includes the SD variables, a process that can take some time.
  • Information about how to apply for Secure Lab access is available on the Apply to access controlled data in SecureLab on the UK Data Service website.

SD variables unavailable/ costly to acquire

  • Check whether the estimates have been published by the data producer, in which case computation may not be needed

  • Find out about the weights in the data documentation

  • Declare the survey design as SRS using survey commands

  • Compute your estimates, using survey estimation commands then:

    • Check whether the data producer has published design effects (also Deft for the same population at another point in time)
    • If so, adjust standard errors/CI accordingly
    • If no Deft are available, report SRS estimates; make it clear that the results are likely to be biased
  • The wider the SE or CI under SRS assumptions the larger the likely bias

  • Or: the smaller the (sub)sample, the larger the likely bias

  • In case of significance testing, good practice is to rely on .01 or .001 significance thresholds.

One slide summary

  • Using survey design functions is often the safer thing to do, even when survey design variables are not present
  • Do not use on the fly weighting with grossing weight in SAS or SPSS
  • Do a cost-benefit analysis of the sensitivity of your research vs the time cost of trying to obtain high quality estimates
  • Read the data documentation/look up data producers website for Design factors or published estimates
  • Try to think of/report your estimates as a range of values ie confidence intervals rather than a point estimate.

References

Kish, Leslie. 1995. Survey Sampling. Wiley Classics Library. New York: Wiley.